Examining the Seattle and San Francisco crime data for the summer of 2014, as reported by the cities, I noticed that San Francisco, despite having a population less than 30 percent greater than the population of Seattle, had more than twice as many recorded cases of drug crime. Does this reflect a greater rate of use of certain drugs in San Francisco than in Seattle, or, rather, perhaps a difference in the manner of enforcement of drug laws or the manner in which reported infractions are recorded? Digging somewhat deeper—examining the data from 2013 through 2017 and on the use of certain popular drugs—I uncovered some striking trends and obtained some possible insight into these questions.

Thus examining the Seattle and San Francisco crime data from 2013 through 2017, one of the first things I noticed was a dramatic increase in the number of recorded incidents of crime. For example, in 2017, the number of reported cases of drug crime in Seattle were, in contrast to the numbers for 2014, greater than the number recorded cases of drug crime in San Francisco. As the following plot of incidents of crime within some selected categories suggests, the fractional increase in recorded incidents of crime in Seattle is roughly the same across the various categories of crime.

The San Francisco dataset was downloaded on 2/7/2018, at: https://data.sfgov.org/Public-Safety/Police-Department-Incidents-Current-Year-2017-/9v2m-8wqu

Seattle dataset was downloaded on 2/7/2018, at: https://data.seattle.gov/Public-Safety/Seattle-Police-Department-Police-Report-Incident/7ais-f98f/data

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
SF_complete <- read_csv("SF_Complete.csv")

#With guess max large enough, columns that contain integers that exceed the 
#32-bit maximum are read as character vectors.
seattle_complete <- read_csv("seattle_complete.csv", guess_max = 100000)
library(tidyverse)
library(lubridate)

SF_complete <- SF_complete %>%
  mutate(Year = as.integer(str_sub(Date, -4, -1))) %>%
  mutate(Category = fct_recode(Category,
        "Drugs" = "DRUG/NARCOTIC"))

seattle_complete <- seattle_complete %>%
  rename(Category = `Summarized Offense Description`) %>%
  mutate(Category = fct_recode(Category,
        "Prostitution" = "PROSTITUTION",
        "Drugs" = "NARCOTICS",
        "Weapons" = "WEAPON",
        "Liquor\nlaws" = "LIQUOR VIOLATION",
        "Assault" = "ASSAULT",
        "Homicide" = "HOMICIDE",
        "Robbery" = "ROBBERY",
        "Vehicle\ntheft" = "VEHICLE THEFT",
        "Theft\nfrom\nvehicle" = "CAR PROWL"))

seattle_complete <- seattle_complete %>%
  mutate(R_date = floor_date(mdy_hms(`Occurred Date or Date Range Start`,
                                     tz = "PST8PDT"), "day"))

SF_complete <- SF_complete %>%
  mutate(R_date = mdy(Date, tz = "PST8PDT"))

seattle_complete %>%
  filter(Category %in% 
        c("Prostitution",
        "Drugs",
        "Weapons",
        "Assault",
        "Homicide",
        "Robbery",
        "Vehicle\ntheft",
        "Theft\nfrom\nvehicle")) %>%
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = Category, fill = as.factor(Year))) +
  geom_bar(position = "dodge") +
  ggtitle("Recorded Cases of Selected Categories of Crime: Seattle") +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(fill = "Year")

San Francisco has seen no such increase in the number of recorded incidents. Furthermore, there is no evidence of the massive crime wave in Seattle that this data would suggest if the increase were taken to reflect the actual increase in crime. For example, there are reports of recent increases of use of some drugs, particularly heroin, but nothing close to the more than 100 percent increase in use, from one year to the next, that the data on drug crime in Seattle might seem to suggest.

Looking somewhat deeper at the recorded cases of illegal drug use in the two cities, let’s consider the differences between crimes that would be considered trafficking and those that would be considered possession.

library(plotly)

seattle_drugs_13_17 <- seattle_complete %>%
  filter(Category == "Drugs" & Year %in% c(2013:2017))

SF_drugs_13_17 <- SF_complete %>%
  mutate(`Cited or Arrested` = (str_detect(Resolution, "ARREST") | 
                                str_detect(Resolution, "CITED"))) %>%
  filter(Category == "Drugs" & Year %in% c(2013:2017))

SF_drugs_13_17 <- SF_drugs_13_17 %>% 
  mutate(`Drug Offense Type` = 
          ifelse(str_detect(Descript, "LAB APPARATUS"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "POSSESSION"), "Possession",
              ifelse(str_detect(Descript, "SALE"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "TRANSPORT"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "PLANTING/CULTIVATING MARIJUANA"), 
                     "Sales, Transportation, or Production",
              "Other"
              ))))))

seattle_drugs_13_17 <- seattle_drugs_13_17 %>%
  mutate(`Drug Offense Type` = 
          ifelse(str_detect(`Offense Type`, "POSSESS"), "Possession",
              ifelse(str_detect(`Offense Type`, "FOUND"), "Possession",
              ifelse(str_detect(`Offense Type`, "DISTRIBUTE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "SMUGGLE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "PRODUCE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "SELL"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "TRAFFIC"), 
                     "Sales, Transportation, or Production",
              "Other"))))))))
  

joined_counts <- (seattle_drugs_13_17 %>% count(Year, `Drug Offense Type`)) %>%
  left_join((SF_drugs_13_17 %>% count(Year, `Drug Offense Type`)), by = c("Year", "Drug Offense Type")) %>%
  gather(n.x, n.y, key = City, value = Incidents) %>%
  mutate(City = fct_recode(City,
                           Seattle = "n.x",
                           `San Francisco` = "n.y"))

#reformat some categories that will appear in the plot
joined_counts <- joined_counts %>% 
  mutate(`Drug Offense Type` = as.factor(`Drug Offense Type`)) %>%
  mutate(`Drug Offense Type` = 
           fct_recode(`Drug Offense Type`,
                      " Sales, Transportation,\nor Production" = "Sales, Transportation, or Production",
                      " Possession" = "Possession"))

plot <- ggplot(joined_counts %>% filter(`Drug Offense Type` != "Other")) +
  geom_col(aes(x = Year, y = Incidents, fill = City, color = `Drug Offense Type`),
                        position = "dodge") +
  theme(legend.title = element_blank()) +
  labs(x = "year", y = "count",
       title = "Drug-related Police Incidents: possession versus trafficking")

ggplotly(plot) %>% layout(margin = list(b = 50, l = 60, r = 10, t = 80))

We see that, unlike the Seattle data, the San Francisco data show a steady decrease in recorded drug-crime incidents from one year to the next. The recorded incidents for Seattle show similar fractional decreases from 2013 to 2014, but then, as with the data related to other categories of crime, begin to increase significantly after that. Let’s look at the data for particular drugs, and consider it together with the totals for all of the recorded instances of crime.

seattle_drugs_13_17 <- seattle_drugs_13_17 %>%
  mutate(`Drug Type` = ifelse(str_detect(`Offense Type`, "MARIJU"), "Marijuana",    #yes
                            ifelse(str_detect(`Offense Type`, "METH"), "Methamphetamine",    #yes
                            ifelse(str_detect(`Offense Type`, "COCAINE"), "Cocaine",  #yes 
                            ifelse(str_detect(`Offense Type`, "HEROIN"), "Heroin",  #yes
                            ifelse(str_detect(`Offense Type`, "PRESCRIPTION"), "Prescription", #?
                            ifelse(str_detect(`Offense Type`, "PILL/TABLET"), "Pill/Tablet",  #?
                            ifelse(str_detect(`Offense Type`, "HALLUCINOGEN"), "Hallucinogen",
                            ifelse(str_detect(`Offense Type`, "SYNTHETIC"), "Synthetic",  #?
                            ifelse(str_detect(`Offense Type`, "AMPHETAMINE"), "Amphetamine", #yes
                            ifelse(str_detect(`Offense Type`, "OPIUM"), "Opium", #yes
                            ifelse(str_detect(`Offense Type`, "PARAPHENALIA"), "Paraphernalia", #yes
                            "Other")
                            )))))))))))
  
SF_drugs_13_17 <- SF_drugs_13_17 %>%
  mutate(`Drug Type` = ifelse(str_detect(Descript, "MARIJUANA"), "Marijuana",   #yes
                            ifelse(str_detect(Descript, "COCAINE"), "Cocaine",
                            ifelse(str_detect(Descript, "METH-AMPHETAMINE"), "Methamphetamine",
                            ifelse(str_detect(Descript, "BARBITUATES"), "Barbituates",
                            ifelse(str_detect(Descript, "CONTROLLED SUBSTANCE"), 
                                              "Controlled Substance",
                            ifelse(str_detect(Descript, "HALLUCINOGENIC"), "Hallucinogenic",
                            ifelse(str_detect(Descript, "AMPHETAMINE"), "Amphetamine", 
                            ifelse(str_detect(Descript, "METHADONE"), "Methadone",
                            ifelse(str_detect(Descript, "PARAPHERNALIA"), "Paraphernalia",
                            ifelse(str_detect(Descript, "OPIATES"), "Opiates",
                            ifelse(str_detect(Descript, "OPIUM"), "Opium",
                            ifelse(str_detect(Descript, "HEROIN"), "Heroin",
                            "Other")
                            ))))))))))))
library(lubridate)
library(gridExtra)

#fig.height = 2, fig.width = 3

#seattle_drugs_13_17 <- seattle_drugs_13_17 %>%
#  mutate(R_date = floor_date(mdy_hms(`Occurred Date or Date Range Start`,
#                                     tz = "PST8PDT"), "day"))

#SF_drugs_13_17 <- SF_drugs_13_17 %>%
#  mutate(R_date = mdy(Date, tz = "PST8PDT"))

seattle_selected_drugs <- seattle_drugs_13_17 %>%
  filter(`Drug Type` %in% c("Marijuana", "Cocaine", "Methamphetamine", "Heroin"))

SF_selected_drugs <- SF_drugs_13_17 %>%
  filter(`Drug Type` %in% c("Marijuana", "Cocaine", "Methamphetamine", "Heroin"))

number_of_bins <- 78

plt_seattle_selected_drugs <- ggplot(seattle_selected_drugs) + 
  geom_freqpoly(aes(x = R_date, color = `Drug Type`), bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per fortnight", x = "",
       title = "Seattle Police Incidents Related to 4 Popular Drugs",
       color = "Drug: ") +
  theme(legend.position = "bottom",
        plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 12),
        axis.text.x = element_text(size = 10),
        legend.text = element_text(size = 10),
        legend.title = element_text(size = 12)) +
  guides(size = guide_legend(order = 2))

plt_SF_selected_drugs <- ggplot(SF_selected_drugs) +
  geom_freqpoly(aes(x = R_date, color = `Drug Type`), bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per fortnight", x = "",
       title = "San Francisco Police Incidents Related to 4 Popular Drugs") +
  theme(legend.position = "none",
        plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 12),
        axis.text.x = element_text(size = 10))

plt_seattle_all_13_17 <- seattle_complete %>% 
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = R_date)) + geom_freqpoly(bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per fortnight", x = "",
       title = "Seattle Police Incidents: 2013 through 2017") +
  theme(plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 12),
        axis.text.x = element_text(size = 10))

plt_SF_all_13_17 <- SF_complete %>%
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = R_date)) + geom_freqpoly(bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per fortnight", x = "",
       title = "San Francisco Police Incidents: 2013 through 2017") +
  theme(plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 12),
        axis.text.x = element_text(size = 10))

grid.arrange(plt_seattle_all_13_17, plt_SF_all_13_17, 
             plt_SF_selected_drugs, plt_seattle_selected_drugs,  
             layout_matrix = rbind(c(1, 2),
                                   c(4, 3),
                                   c(4, NA)),
             heights = c(4, 6, .7))